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ABSTRACT 

A  discussion  of  multivariate  calibration  techniques  and  their  possible 
application  to  sensor  array  data  is  reviewed.  The  progression  from 
multiple  linear  regression,  and  principal  components  regression,  to  partial 
least  squares  can  be  described  by  the  analysis  of  their  respective 
algorithms  and  the  data  analysis  problems  each  of  them  solve.  Additional 
analysis  models  are  introduced  to  detect  underlying  background  problems 
for  sensor  arrays  and  the  ability  to  quantitate  analytes  directly  by  rank 
annihilation  factor  analysis  when  two  dimensional  data  is  obtained. 


With  the  arrival  of  new  and  novel  types  of  chemical  sensors,  the 

chemist  will  have  to  meet  a  new  challenge  in  data  acquisition  and  analysis. 

The  abundant  Information  content  that  was  once  only  common  in  areas  of 

chemistry  such  as  spectroscopy,  where  hundreds  of  wavelengths  are 

scanned.  Is  now  available  for  microsensors.  The  progress  of  chemical 

sensors  from  the  pH  electrode  to  the  CHEMFET  has  been  remarkable.  With 

microlithography,  hundreds  of  solid  state  sensors  can  be  combined  on  a 

single  chip  to  form  a  data  gathering  array  equivalent  to  a  spectrometer.  The 

next  problem  facing  the  user  of  sensor  arrays  is  that  of  data  analysis.  The 

complexities  of  multisensor  array  responses  or  outputs  have  to  be  sorted 

into  useful  information  since  solid  state  technology  does  not  eliminate  such 

fundamental  problems  as  matrix  effects,  interferences,  and  backgrounds. 

/ 

The  variety  of  techniques  used  In  multivariate  calibration  and  quantitation 


and  how  they  might  apply  to  the  chemical  sensor  array  will  be  discussed  in 
this  paper. 

Linear  Regression. 

The  single-component,  single-sensor  model  used  in  most  analyses  is  a 
linear  relationship  between  the  response  of  a  sensor  and  the  concentration 
of  an  analyte: 

r  =  b0  +  bjc  +  e  (1) 

This  model  requires  that  a  calibration  step  be  performed  to  find  the 
sensitivity  of  the  sensor,  bj  (slope  of  the  regression  line),  and  the  r-axis 
intercept,  b0.  The  residual  error  e  is  the  portion  of  the  response  not 
described  by  the  model.  The  second  step  in  the  analysis  is  then  performed 
by  measuring  the  responses  of  unknown  samples  and  then  estimating  their 
concentrations.  The  assumptions  made  using  least  squares  regression  are 
that  the  calibration  plot  is  linear  and  the  sensor  is  fully  specific  for  the 
analyte  of  interest  (this  also  assumes  that  b0  is  zero).  The  most  commonly 

encountered  problems  using  this  method  are  matrix  effects  and  the  presence 
of  a  background  response.  Matrix  effects  are  due  to  a  change  in  the 
sensitivity  coefficient  bj  between  calibration  and  quantitation.  This  type 

of  effect  is  most  readily  treated  by  using  the  method  of  standard  additions. 
The  addition  of  a  background  or  interference  correction  cannot  be  directly 
included  In  this  model,  especially  if  the  interfered  is  unknown.  This 
suggests  the  use  of  sensor  arrays  which  are  necessary  for  the  analysis  of 
mixtures.  The  advantages  include  reduction  in  analysis  time  and  the 
allowance  of  the  use  of  non-specific  sensors. 

The  data  obtained  from  an  array  of  sensors  does  not  differ  from  that 
obtained  from  more  classical  analytical  instrumentation.  For  this  reason, 
the  types  of  data  analysis  used  for  sensor  arrays  does  not  differ  from  those 


commonly  used  in,  for  example,  emission  spectroscopy.  In  emission 
spectroscopy,  the  spectrum  of  an  analyte  is  defined  as  the  emission 
Intensities  of  the  analyte  at  a  predetermined  number  of  wavelengths.  A 
more  unique  or  distinguishing  spectrum  Is  one  in  which  more  wavelengths 
are  employed.  An  important  characteristic  that  determines  the  amount  of 
information  contained  in  a  spectrum  is  the  degree  of  orthogonality  or 
independence  of  the  wavelengths.  This  becomes  clear  if  one  considers  the 
hypothetical  case  of  a  spectrum  defined  by  two  wavelengths.  If  one  adds  a 
third  wavelength  whose  intensity  is  always  the  sum  of  the  first  two,  no 
new  information  is  obtained  by  its  inclusion.  For  an  array  of  sensors,  the 
“spectrum"  or  signature  of  an  analyte  is  the  composite  of  responses  of  each 
sensor  to  the  analyte.  It  is  not  a  true  spectrum  in  that  the  ordering  of  the 
sensors,  and  therefore  the  shape  of  the  spectrum,  is  arbitrary,  in  true 
spectra,  either  time  or  a  progression  of  wavelengths  serves  to  define  their 
shapes,  although  adjacent  wavelengths  are  often  highly  correlated.  Such  a 
correlation  arises  since  adjacent  wavelengths  are  nearly  equal  in  energy. 
When  this  occurs,  many  wavelengths  must  be  used  to  obtain  information 
that  could  be  contained  in  a  smaller,  more  independent  array  of  sensors.  To 
improve  the  uniqueness  of  an  array  signature,  more  sensors  can  be  used. 
Therefore,  one  powerful  aspect  of  arrays  is  that  sensors  whose  responses 
are  independent  can  be  chosen.  The  mechanisms  which  determine  the 
responses  of  the  individual  sensors  can  be  chosen  to  be  nearly  orthogonal. 


HIElWlKEIfliCItllEfrElMll 


One  of  the  most  widespread  uses  of  chemometrics  is  in  the  estimation  of  a 
model  used  for  calibration.  This  is  often  done  for  a  multicomponent  mixture 
where  the  concentration  of  p  analytes  in  a  mixture  is  sought.  Perhaps  the 
simplest  method  to  understand  is  the  method  of  multiple  linear  regression 
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(MLR)  (1).  with  an  array  of  sensors,  a  typical  problem  would  be  to  determine 
the  concentrations  or  p  analytes  in  n  samples  where  an  array  or  m  sensors 
is  employed,  Figure  l.  To  solve  this  problem,  two  assumptions  must  nrst 
be  made.  First,  it  is  assumed  that  the  sensors  respond  linearly.  This  states 
that  doubling  the  concentration  of  an  analyte  will  double  that  analyte's 
response  rrom  each  sensor.  Second,  it  is  assumed  that  the  response  of  a 
sensor  to  two  or  more  analytes  Is  additive.  For  example,  if  a  sensor  has  a 
response  of  two  and  three  units  to  one  molar  solutions  of  analytes  A  and  B, 
respectively,  then  its  response  to  a  mixture  of  one  molar  A  and  one  molar  B 
will  be  five  units.  With  these  assumptions,  the  problem  can  be  solved  using 
MLR. 

The  response  matrix  X  is  a  n  x  m  matrix  whose  rows  are  the  n 
samples  and  columns  are  the  responses  of  the  m  sensors: 

XK  =  Y  (2) 

Y  is  a  matrix  of  unknown  concentrations  with  n  samples  as  rows  ana  p 
analytes  as  columns,  and  K  Is  the  calibration  matrix  of  regression 
coefficients.  To  solve  the  problem,  one  must  first  use  a  calibration  set  (X0 
and  Y0  are  the  matrices  of  calibration  responses  and  concentrations)  where 
the  analyte  concentrations  for  the  n  samples  are  known.  Knowing  X0  and  Y0, 

K  can  be  determined  by  linear  algebra.  If  the  number  of  samples  does  not 
equal  the  number  of  sensors,  then  X0  is  not  a  square  matrix,  and  its  inverse 

cannot  be  determined.  Thus  both  sides  of  the  equation  must  be  multiplied  by 
X0T  before  the  inverse  is  found . 


K-(X0t  x0)-i  x0t  y0 


(3) 
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Once  this  K  Is  determined,  the  Initial  Xmjx  obtained  from  the  unknown 
mixtures  can  be  used  to  find  the  concentrations  of  the  mixtures. 

Ymix  =  xmixK  (4) 

Note  that  this  K  matrix  does  not  contain  information  about  the 
sensitivities  of  the  sensors  to  the  analytes.  Since  this  information  is  often 
important,  the  matrix  of  sensitivities  K+  can  be  derived  as  follows: 

K+  =  KT(KKT)-1  (5) 

This  equation  calculates  the  best  least  square  fit  describing  the  data 
points  for  each  analyte  and  each  sensor  using  the  criteria.  It  is  an  adequate 
method  in  many  cases,  but  in  some  instances  its  application  is  either  not 
appropriate  or  impossible.  In  the  under-determined  case  where  the  number 
of  analytes  (p)  is  greater  than  the  number  of  sensors  (m),  MLR  gives  an 
infinite  number  of  solutions  to  the  problem  and  is  therefore  useless.  When  p 
equals  m,  there  is  one  unique  solution,  and  when  p  is  less  than  m,  more 
information  is  available  and  therefore,  a  better  statistical  fit  is  achieved. 

Another  common  problem  is  col  linearity,  which  occurs  when  the  X0 
matrix  does  not  have  full  rank,  i.e.  some  number  z  of  the  columns  in  X0  are 
nearly  linear  combinations  of  the  remaining  m-z  columns.  When  this  occurs, 
the  inversion  procedure  becomes  sensitive  to  errors  when  calculating  K,  and 
small  errors  in  an  observed  sensor  response  value  in  X  will  result  in  large 
errors  in  the  corresponding  concentration  estimates  in  Y 
Background  Identification. 
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It  is  assumed  In  the  MLR  calibration  step  that  the  calibration  mixture 
contains  the  same  or  greater  number  of  analytes  as  in  the  unknown  mixture. 
The  unknown  presence  of  a  background,  an  uncallbrated  analyte,  if  not 
eliminated,  will  cause  erroneous  results  in  the  measurement  of  a  mixture. 

A  method  proposed  by  Osten  and  Kowalski  for  background  identification  can 
be  used  to  test  whether  further  calibration  or  sample  purification  is 
necessary  (2).  This  technique  uses  a  new  maxtrix  X  where  the  first  N  rows 
of  X  are  identical  to  the  K  calibration  matrix  (actually  each  row  contains 
the  responses  of  each  sensor  to  a  pure  analyte)  and  the  N+ 1 st  row  is  the 
measured  responses  of  a  mixture  of  analytes  (e  g.  a  real  unknown  sample). 

The  method  involves  the  normalization  of  X  by  rows  (the  sum  of  each 
row  element  equals  one)  and  then  mean-centering  the  data  by  subtraction  of 
the  mean  response  of  each  sensor  for  the  N  analytes  from  each  entry  for 
that  sensor  in  the  matrix.  The  second  moment  matrix  X*X/N  is  calculated 
using  only  the  first  N  rows  of  the  normalized,  mean-centered  X  matrix. 
Diagonalization  of  this  moment  matrix  gives  rise  to  two  matrices,  one  being 

N-l  eigenvalues,  E,  and  the  other  being  N-l  eigenvectors,  V. 

Now  the  scaled  and  centered  mixture  response  vector,  xN+1  (the  N+ 1 

row  of  X),  can  be  rotated  by  the  eigenvector  matrix  V. 

sN+t“XN+iV  (6) 

The  mixture  response  vector  can  be  estimated  by  the  eigenvectors  and  the 
above  scores  vector,  sN+  j ,  by 

x"N+i  =  VsN+1  (7) 

The  difference  between  the  actual  and  predicted  response  vectors  results  in 
the  residuals  vector  h. 


h  =  xN+1  -  x"N+i 


(8) 
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The  sum  of  the  squares  of  each  element  of  xN+1  and  x“N+i  are  calculated  and 
compared.  If  the  residual  vector  h  contains  more  than  I  x  10"3%  of  the 
original  variance  in  the  mixture  response,  then  a  background  component  is 
present.  The  major  problem  with  sensor  array  data  is  that  background 
detection  is  possible  but  not  correctable  without  knowing  what  the 
components  of  the  background  are  in  order  to  include  them  in  the  calibration 
step. 

Principal  Component  Regression. 

An  alternative  to  the  linear  calibration  problem,  principal  component 
regression  (PCR),  couples  factor  analysis  or  principal  component  analysis 
(PCA)  with  multi-linear  regression  (3).  This  method  is  not  as  sensitive  to 
the  col  1  inearity  problem  and  aids  in  the  determination  of  the  best  solution 
in  the  under-determined  case.  PCA  is  a  method  in  which  more  descriptive 
variables  (which  correspond  to  columns)  are  calculated.  These  new 
variables,  which  are  called  principal  components  or  eigenvectors,  are  linear 
combinations  of  the  original  columns.  They  are  more  descriptive  because 
they  are  chosen  to  describe  the  maximum  amount  of  variance  in  a  data 
matrix.  To  illustrate  this,  the  entries  in  each  column  can  be  viewed  as 
defining  a  corresponding  vector's  orientation  in  m-dimensional  space.  The 
first  eigenvector  is  the  vector  whose  direction  describes  the  maximum 
amount  of  variance  out  of  all  of  the  possible  directions.  The  second 
eigenvector  is  by  definition  orthogonal  to  the  first  and  is  the  second  most 
descriptive  direction.  Since  the  eigenvectors  are  in  m-space,  the  maximum 
number  of  eigenvectors  equals  the  number  of  columns  or  m.  Often  times  r 
eigenvectors  or  columns,  where  r<p,  can  be  used  to  describe  all  or  nearly  all  . 
of  the  variance  in  a  data  matrix.^  When  this  is  true,  the  m  x  p  matrix  can  be 
reduced  to  a  m  x  r  matrix  where  the  new  columns  in  the  reduced  matrix  are 
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Independent  and  are  the  coordinates  of  the  samples  in  the  new  coordinate 
system  denned  Dy  the  eigenvectors  (as  opposed  to  the  original  columns). 

The  first  step  In  PCR  Is  to  perform  PCA  on  X  and  calculate  a  new  smaller 
matrix  5  (n  x  r)  using  the  first  r  eigenvectors  (r<m  ).  This  step  is  identical 
to  that  used  In  equation  6.  A  similar  procedure  called  cross-validation  (4) 
can  also  be  used  to  optimally  choose  a  number  of  components  when 
reduction  of  dimensionality  is  necessary. 

The  next  step  Is  to  perform  MLR  of  S  onto  Y. 

SK  =  Y  (9) 

Because  the  columns  of  5  are  orthogonal,  there  is  no  col  linearity  problem, 
and  the  number  of  eigenvectors  used  can  be  chosen  to  accommodate  the 
number  of  rows  or  samples  present.  Of  course  the  best  approach  is  to 
always  be  over-determined  (m>p),  but  where  this  is  not  possible,  the  best 
result  is  that  in  which  the  maximum  amount  of  information  is  used  to  derive 
the  solution. 

When  using  an  array  of  sensors,  PCA  can  have  another  important  use. 

Performing  PCA  on  the  X  matrix  will  yield  new  columns  that  are  linear 
combinations  of  the  original  columns  as  stated  before.  The  loadings  or 
contributions  of  the  original  columns  to  the  new  columns  can  be  examined  to 
determine  the  information  content  of  the  original  columns.  Since  these 
columns  correspond  to  the  responses  of  Individual  sensors,  an  analyst  can 
use  the  results  of  PCA  to  determine  whether  or  not  a  sensor  is  useful  (5). 

Informative  sensors  will  load  heavily  into  the  first  r  eigenvectors  where  r 
is  chosen  such  that  the  first  r  eigenvectors  describe  a  predetermined 

j 

amount  of  the  variance  in  the  data  set.  In  this  way  many  sensors  can  be 
tested  simultaneously,  and  the  sensors  that  contribute  the  most  to  the  first 
r  eigenvectors  are  selected  to  form  the  array.  The  best  approach  is  to  I 
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select  the  major  contributor  to  the  first  eigenvector  as  the  first  sensor,  the 
major  contributor  to  the  second  eigenvector  as  the  second  sensor,  and  so  on. 
One  would  not  chose  all  of  the  sensors  that  loaded  into  the  first 
eigenvector,  even  though  It  is  the  most  descriptive,  because  columns  that 
load  Into  the  same  eigenvectors  are  usually  highly  correlated  and  therefore 
would  cause  collinearity  problems.  The  successive  eigenvectors  are 
orthogonal;  therefore  the  major  contributors  to  the  successive  eigenvectors 
are  also  more  nearly  orthogonal. 

Partial  Least  Squares 

One  of  the  latest  regression  procedures  to  be  developed  is  that  of  partial 
least  squares  (PL5).  PLS  was  first  described  in  the  middle  of  the  1960‘s  by 
Herman  Wold  (£).  It  was  used  moderately  in  the  fields  of  econometrics, 
sociology  and  psychology  during  the  seventies  (7).  The  first  use  in 
chemistry  was  reported  by  H.  Wold,  B.  Gerlach  and  B.  Kowalski  in  the  late 
seventies  (£  Vol.  2,  chapt.9).  Since  then,  the  groups  of  Svante  Wold  at  Umea 
University  (Umea,  Sweden)  and  Harald  Martens  at  the  Norwegian  Food 
Research  institute  (As,  Norway)  have  been  refining  and  specializing  the 
method  for  chemical  applications.  References  8,  9  and  10  give  a 
description  of  that  work. 

The  PLS  method  of  regression  is  based  on  the  properties  of  multiple 
linear  regression  (MLR)  and  of  principal  component  analysis  (PCA).  It  is 
considered  the  best  of  both  methods.  The  important  aspects  of  the  PLS 
method  are: 

-Model  building 

-Prediction  (Figure  2) 

-Parsimony 


One  important  field  of  use  of  PL5  is  in  multivariate  calibration  in 
analytical  chemistry.  Applications  in  this  field  can  easily  be  extended  to 
the  calibration  of  sensors.  Up  to  now,  almost  all  chemical  applications  have 
used  the  2-block  PL5  model,  where  the  response  and  concentration  matrices, 
Figure  1,  are  considered  blocks  of  data.  One  advantage  of  PLS  is  the  use  of 
more  than  one  response  block  to  be  regressed  with  the  concentration  block. 
This  aspect  of  PLS  has  not  been  applied  in  a  chemical  experiment  as  of  this 
time. 

The  2-block  PLS  uses  the  same  data  structure  as  MLR,  except  the 
regression  algorithm  decomposes  both  blocks  into  sums  of  simpler 
matrices.  Principal  Component  Regression  (PCR)  is  the  case  where  the  X- 
block  is  decomposed  and  the  regression  is  modeled  from  the  X-block  scores 
against  the  V-block,  equation  9.  In  PLS,  rotated  factors  called  latent 
variables  are  used  for  the  regression  part  instead  of  principal  components. 
The  variables  are  rotated  for  optimizing  the  correlation  between  the 
scores  of  both  blocks  as  in  Canonical  Correlation.  Although  it  is 
impossible  to  describe  completely  how  PLS  works  in  this  paper,  interested 
readers  are  referred  to  a  tutorial  on  PLS  written  by  Geladi  and  Kowalski 
(11).  Table  l  gives  a  comparison  between  MLR,  PCR  and  PLS. 

Table  I. 

A  comparison  of  MLR,  PCR  and  PLS 

MLR  PCR  PLS 

1.  #sarnples  >  'sensors  no  requirement  no  requirement 

2.  works  best  with  accepts  col  1  inear  accepts  collinear 

orthogonal  sensors  sensors  sensors 

3.  matrix  inversion  '  matrix  inversion  no  matrix  inversion 

is  difficult  is  easy 


no  data  on  matrix 
condition 

matrix  condition 
data  available 

matrix  condition 
data  available 

block  data  is  not 
analyzed 

factor  analysis  part 
allows 

classification  and 
pattern  recognition 

factor  analysis  part 
allows 

classification  and 
pattern  recognition 

sensitive  to  noise 

sensitive  to  noise 

separates  noise 
from  relevant 
information 

multiple  dependent 
variables  are  treated 
independently 

multiple  dependent 
variables  are  treated 
independently 

makes  meaningful 
linear 

combinations  in 
dependent  block 

PL5  as  a  regression  method  can  be  used  no  matter  how  many 
variables  (sensors)  there  are  in  the  X  and  Y  blocks  and  col  1  inearity  problems 
can  be  avoided.  PL5  and  PCR,  by  their  nature  give  data  on  the  condition  of 
the  X-block  matrix.  For  MLR  this  would  require  an  extra  calculation  step 
that  is  almost  never  carried  out  by  "black-box"  MLR  users.  One  of  the  main 
advantages  of  PLS  over  PCR  and  its  applicability  to  sensor  arrays  is  that  it 
can  separate  noise  from  useful  information. 

Rank  AhniMkaaFactor  Analysis. 

The  analytical  chemist  is  frequently  confronted  with  the  problem  of 
analyzing  complex  mixtures  in  which  he  is  only  interested  in  the 
concentration  of  a  few  components.  It  would  be  convenient  if  quantitative 
information  could  be  obtained  for  the  analytes  of  interest  without  worrying 
about  the  rest  of  the  sample  components.  Second  order  bilinear  sensors,  i.e. 
sensors  that  give  a  two  dimensional  data  matrix  of  the  form  Mjj=ZkJ3kxi|tyj|C , 
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are  specially  suited  for  this  purpose,  and  the  technique  for  quantitation  is 
known  as  rank  annihilation ' ?-i3).  50  rar  this  method  has  been  applied  to 
excitation-emission  fluorescence(i2-i4)  and  to  LC/UV  Ql)  with  excellent 
results.  One  problem  In  the  calculation  has  been  that  an  iterative  solution 
requiring  many  matrix  diagonal izatlons  was  necessary.  Lorber  (J£)  has 
reported  a  noniterative  solution,  rank  annihilation  factor  analysis  (RAF A), 
presenting  the  problem  as  a  generalized  eigenvalue  problem  in  which  a 
direct  solution  is  found  by  using  the  singular  value  decomposition. 

In  practice,  Nk,  the  bilinear  spectrum  of  a  pure  compound  k,  is 

known,  and  M,  the  bilinear  spectrum  of  a  mixture  sample  where  the  k 
compound  Is  present.  Is  measured.  This  data  matrix  FI  can  be  expressed  as  a 
linear  combination  of  the  n  pure  components  bilinear  spectra  Nk: 

M  =  2k  fikNk  where  Nk  =  xk  ykT  ,  (N^  =  xjk  yjk  ( 1 0) 

The  xk  are  column  vectors  with  information  in  one  order,  e.g.  excitation 
spectra,  and  the  ykT  are  row  vectors  with  information  in  the  second  order, 
e.g.  emission  spectra.  If  Nk  is  defined  as  unitary  concentration  pure 
component  bilinear  spectra,  then  J3k  is  the  concentration  of  the  k^ 


compound. 


If  the  data  matrix  M  has  rank  p,  subtracting  from  M  the  right  amount 
of  Nk,  i.e.  I3kNk,  the  resultant  matrix  will  have  rank  p- 1,  or  in  a  equivalent 

manner 


det(M  -  13kNk)  =  0  (11) 

To  solve  this  equation  for  J3ki  the  generalized  eigenvalue  problem  is  applied. 
The  matrices  M  and  Nk  are  normally  rectangular,  so  a  transformation  is 


necessary. 

Equation  ( 1 1 )  can  be  rewritten  as: 


Nkz  =  xkttz  (12b) 

Next  the  singular  value  decomposition  of  the  M  matrix  is  obtained. 

M  =  U  S  VT  (13) 

where 

M  V  =  S  U  (14) 

MTU  =  S  V  (15) 

MTMV  =  S2V  eigen-equations  in  V  space.  (16) 

MMTU  =  S2U  eigen-equations  in  U  space.  ( 1 7) 


The  second  step  is  to  determine  the  number  of  significant 
eigenvalues  p  (equations  16-17)  using  abstract  factor  analysis  (15). 

To  transform  equation  12  to  the  normal  eigenvalue  equation,  a  new  matrix  M 
which  is  obtained  from  U,V  and  S  is  generated  by  taking  the  first  p  columns, 

M  =  USVT  (18) 

substitutingMfortl  in  equation  12 b. 

Nk  z  =  Akn  z  d9) 

NkZ  =  xkUSVTZ  (20) 

The  eigenvector  z  is  replaced  by  z  =  V  S-'z‘,  where  z  =  S  VT  z;  therefore 
Nk  V  5-,Z'  =  xkil  i  (21) 

Left  multiplying  by  UT  results  in 

(UT  Nky)  S_1z  =  (U^kU)  z'  =  xk  i 

(UTNk3t5.-’)z=xkz-  (22) 

This  is  the  normal  eigenvalue  equation,  with  matrix  (UT  Nk  \^5."’)  being 
square.  Because  the  rank  of  Nk  is  one,  there  will  be  p- 1  zero  solutions  for 

the  eigenvalues.  Therefore,  the  only  non-zero  solution  will  be  equal  to  the 
trace  of  matrix  (UT  NkJ£S"').  By  calculating  the  trace  of  the  above  matrix, 

the  concentration  of  that  component  (inverse  of  x)  is  solved  directly.  Other 
analytes  can  then  be  analyzed  sequentially. 
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It  has  been  shown  that  a  wide  variety  of  data  analysis  techniques  can 
be  applied  to  sensor  array  data.  Since  each  method  orrers  different 
advantages,  It  is  essential  that  the  proper  technique  used  should  maximize 
information  received  while  minimizing  error  at  the  cost  of  programming 
complexity.  A  direct  result  of  the  above  comparison  of  multivariate 
techniques  shows  that  In  moving  toward  sensor  arrays,  problems  such  as 
Interferences,  if  known,  can  be  calibrated  Into  a  model  for  quantitation. 
Futhermore,  the  more  complex  the  response  data  of  a  sensor  array,  time 
responses  for  example,  the  more  useful  Information  can  be  extracted  as 
seen  in  RAF  A.  A  direct  result  of  this  type  of  data  analysis  could  provide  a 
sensor  array  on  a  single  silicon  chip  which  could  directly  measure  10  to  20 
blood  constituents  Intravenously  in  minutes.  A  number  of  studies  are 
underway  in  our  laboratory  aimed  at  extending  the  calibration  mathematics 
described  above  and  applying  them  to  a  variety  of  sensor  arrays  for  process 
monitoring  and  control. 
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FIGURE  1 .  Data  blocks  or  matrices  using  multivariate  regression. 


FIGURE  2.  The  predictive  aspect  of  PLS.  The  top  half  of  the  diagram 

ia  used  for  the  calibration  data,  while  the  bottom  half  13  for 
the  estimation  for  test  data.  X  snd  Y  are  calibration  data, 
X'  is  the  unknown  response  data,  and  9  is  the  unknown 
concentration  data 
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